Advertisement: Support JavaWorld, click here!
April 1999
HOME FEATURED TUTORIALS COLUMNS NEWS & REVIEWS FORUM JW RESOURCES ABOUT JW






ARCHIVE

TOPICAL INDEX
Core Java
Enterprise Java
Micro Java
Applied Java
Java Community

JAVA Q&A INDEX

JAVA TIPS INDEX

JavaWorld Services

Free JavaWorld newsletters

ProductFinder

Education Resources

White Paper Library

NEW! Rational Resources


XML for the absolute beginner

A guided tour from HTML to processing XML with Java

Summary
In just a few short years, the World Wide Web and HTML have taken the world by storm. But HTML's limitations and the ever-increasing demand for more flexibility in Internet systems has XML, the Extensible Markup Language, brewing on the horizon. Further, Java applications that move data around need a data representation format as portable as Java itself. Developers who learn XML now will find it a powerful tool for data representation, storage, modelling, and interoperation.

Mark Johnson steps away from his popular JavaBeans column this month to introduce you to the world of XML: where it came from, why it's necessary, how it interoperates with existing Internet technology, and how to use it in your designs. You'll learn about Cascading Style Sheets and XSL, then follow up with a look at the XML and Java technology base at a promising Internet startup, with comments from that company's CEO and technical lead. By the time you've finished reading Mark's article, you'll understand why so many people are paying so much attention to this new data representation standard. (11,000 words)

By Mark Johnson


Printer-friendly version Printer-friendly version | Send this article to a friend Mail this to a friend


Page 1 of 10

Advertisement

HTML and the World Wide Web are everywhere. As an example of their ubiquity, I'm going to Central America for Easter this year, and if I want to, I'll be able to surf the Web, read my e-mail, and even do online banking from Internet cafés in Antigua Guatemala and Belize City. (I don't intend to, however, since doing so would take time away from a date I have with a palm tree and a rum-filled coconut.)

And yet, despite the omnipresence and popularity of HTML, it is severely limited in what it can do. It's fine for disseminating informal documents, but HTML now is being used to do things it was never designed for. Trying to design heavy-duty, flexible, interoperable data systems from HTML is like trying to build an aircraft carrier with hacksaws and soldering irons: the tools (HTML and HTTP) just aren't up to the job.

The good news is that many of the limitations of HTML have been overcome in XML, the Extensible Markup Language. XML is easily comprehensible to anyone who understands HTML, but it is much more powerful. More than just a markup language, XML is a metalanguage -- a language used to define new markup languages. With XML, you can create a language crafted specifically for your application or domain.

XML will complement, rather than replace, HTML. Whereas HTML is used for formatting and displaying data, XML represents the contextual meaning of the data.

This article will present the history of markup languages and how XML came to be. We'll look at sample data in HTML and move gradually into XML, demonstrating why it provides a superior way to represent data. We'll explore the reasons you might need to invent a custom markup language, and I'll teach you how to do it. We'll cover the basics of XML notation, and how to display XML with two different sorts of style languages. Then, we'll dive into the Document Object Model, a powerful tool for manipulating documents as objects (or manipulating object structures as documents, depending upon how you look at it). We'll go over how to write Java programs that extract information from XML documents, with a pointer to a free program useful for experimenting with these new concepts. Finally, we'll take a look at an Internet company that's basing its core technology strategy on XML and Java.

Is XML for you?
Though this article is written for anyone interested in XML, it has a special relationship to the JavaWorld series on XML JavaBeans. (See Resources for links to related articles.) If you've been reading that series and aren't quite "getting it," this article should clarify how to use XML with beans. If you are getting it, this article serves as the perfect companion piece to the XML JavaBeans series, since it covers topics untouched therein. And, if you're one of the lucky few who still have the XML JavaBeans articles to look forward to, I recommend that you read the present article first as introductory material.

A note about Java
There's so much recent XML activity in the computer world that even an article of this length can only skim the surface. Still, the whole point of this article is to give you the context you need to use XML in your Java program designs. This article also covers how XML operates with existing Web technology, since many Java programmers work in such an environment.

XML opens the Internet and Java programming to portable, nonbrowser functionality. XML frees Internet content from the browser in much the same way Java frees program behavior from the platform. XML makes Internet content available to real applications.

Java is an excellent platform for using XML, and XML is an outstanding data representation for Java applications. I'll point out some of Java's strengths with XML as we go along.

Let's begin with a history lesson.

The origins of markup languages
The HTML we all know and love (well, that we know, anyway) was originally designed by Tim Berners-Lee at CERN (le Conseil Européen pour la Recherche Nucléaire, or the European Laboratory for Particle Physics) in Geneva to allow physics nerds (and even non-nerds) to communicate with each other. HTML was released in December 1990 within CERN, and became publicly available in the summer of 1991 for the rest of us. CERN and Berners-Lee gave away the specifications for HTML, HTTP, and URLs, in the fine old tradition of Internet share-and-enjoy.

Berners-Lee defined HTML in SGML, the Standard Generalized Markup Language. SGML, like XML, is a metalanguage -- a language used for defining other languages. Each so-defined language is called an application of SGML. HTML is an application of SGML.

SGML emerged from research done primarily at IBM on text document representation in the late '60s. IBM created GML ("General Markup Language"), a predecessor language to SGML, and in 1978 the American National Standards Institute (ANSI) created its first version of SGML. The first standard was released in 1983, with the draft standard released in 1985, and the first standard was published in 1986. Interestingly enough, the first SGML standard was published using an SGML system developed by Anders Berglund at CERN, the organization that, as we have seen, gave us HTML and the Web.

SGML is widely used in large industries and governments such as in large aerospace, automotive, and telecommunications companies. SGML is used as a document standard at the United States Department of Defense and the Internal Revenue Service. (For readers outside of the US, the IRS are the tax guys.)

Albert Einstein said everything should be made as simple as possible, and no simpler. The reason SGML isn't found in more places is that it's extremely sophisticated and complex. And HTML, which you can find everywhere, is very simple; for a lot of applications, it's too simple.


Next page >
Page 1 XML for the absolute beginner
Page 2 HTML: All form and no substance
Page 3 An XML conceptual example
Page 4 Make up a markup
Page 5 So, what good is made-up markup?
Page 6 Cascading Style Sheets: not just for HTML anymore
Page 7 XSL: I like your style
Page 8 Modeling information structure in XML
Page 9 XML and Java
Page 10 Become a tree surgeon!

Printer-friendly version Printer-friendly version | Send this article to a friend Mail this to a friend



Advertisement: Support JavaWorld, click here!


HOME |  FEATURED TUTORIALS |  COLUMNS |  NEWS & REVIEWS |  FORUM |  JW RESOURCES |  ABOUT JW |  FEEDBACK

Copyright © 2003 JavaWorld.com, an IDG company